Eecient Algorithms for Decision Tree Cross-validation

نویسندگان

  • Hendrik Blockeel
  • Jan Struyf
چکیده

Cross-validation is a useful and generally applicable technique often employed in machine learning, including decision tree induction. An important disadvantage of straightforward implementation of the technique is its computational overhead. In this paper we show that, for decision trees, the computational overhead of cross-validation can be reduced signiicantly by integrating the cross-validation with the normal decision tree induction process. We discuss how existing decision tree algorithms can be adapted to this aim, and provide an analysis of the speedups these adaptations may yield. The analysis is supported by experimental results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Eecient Algorithms for Decision Tree Cross-validation (extended Abstract)

Extended abstract Cross-validation is a generally applicable and very useful technique for many tasks often encountered in machine learning, such as accuracy estimation, feature selection or parameter tuning. A common property of these tasks is that one wants to validate a learned theory on a set of examples not used for its construction (i.e., an \independent test set"). When insuucient data a...

متن کامل

Efficient algorithms for decision tree cross-validation

Cross-validation is a useful and generally applicable technique often employed in machine learning, including decision tree induction. An important disadvantage of straightforward implementation of the technique is its computational overhead. In this paper we show that, for decision trees, the computational overhead of cross-validation can be reduced significantly by integrating the crossvalida...

متن کامل

Cross-Validated C4.5: Using Error Estimation for Automatic Parameter Selection

Machine learning algorithms for supervised learning are in wide use. An important issue in the use of these algorithms is how to set the parameters of the algorithm. While the default parameter values may be appropriate for a wide variety of tasks, they are not necessarily optimal for a given task. In this paper, we investigate the use of cross-validation to select parameters for the C4.5 decis...

متن کامل

Evaluation of Best First Decision Tree on Categorical Soil Survey Data for Land Capability Classification

Land capability classification (LCC) of a soil map unit is sought for sustainable use, management and conservation practices. High speed, high precision and simple generating of rules by machine learning algorithms can be utilized to construct pre-defined rules for LCC of soil map units in developing decision support systems for land use planning of an area. Decision tree (DT) is one of the mos...

متن کامل

A Comparison of Accuracy between Decision Tree and k-NN Algorithm

Data mining has many functionalities. One of the main functions of data mining is the classification that is used to predict the class and generate information based on historical data. In the classification, there is a lot of algorithms that can be used to process the input into the desired output, thus it is very important to observe the performance of each algorithm. The purpose of this rese...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002